{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "FruitToEmoji-GIT.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f92-4Hjy7kA8",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://www.arduino.cc/\"><img src=\"https://raw.githubusercontent.com/sandeepmistry/aimldevfest-workshop-2019/master/images/Arduino_logo_R_highquality.png\" width=200/></a>\n",
        "# Tiny ML on Arduino\n",
        "## Classify objects by color tutorial\n",
        "\n",
        " \n",
        "https://github.com/arduino/ArduinoTensorFlowLiteTutorials/"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uvDA8AK7QOq-",
        "colab_type": "text"
      },
      "source": [
        "## Setup Python Environment \n",
        "\n",
        "The next cell sets up the dependencies in required for the notebook, run it."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Y2gs-PL4xDkZ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Setup environment\n",
        "!apt-get -qq install xxd\n",
        "!pip install pandas numpy matplotlib\n",
        "%tensorflow_version 2.x\n",
        "!pip install tensorflow"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9lwkeshJk7dg",
        "colab_type": "text"
      },
      "source": [
        "# Upload Data\n",
        "\n",
        "1. Open the panel on the left side of Colab by clicking on the __>__\n",
        "1. Select the Files tab\n",
        "1. Drag `csv` files from your computer to the tab to upload them into colab."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kSxUeYPNQbOg",
        "colab_type": "text"
      },
      "source": [
        "# Train Neural Network\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Gxk414PU3oy3",
        "colab_type": "text"
      },
      "source": [
        "## Parse and prepare the data\n",
        "\n",
        "The next cell parses the csv files and transforms them to a format that will be used to train the full connected neural network.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "AGChd1FAk5_j",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "import tensorflow as tf\n",
        "import os\n",
        "import fileinput\n",
        "\n",
        "print(f\"TensorFlow version = {tf.__version__}\\n\")\n",
        "\n",
        "# Set a fixed random seed value, for reproducibility, this will allow us to get\n",
        "# the same random numbers each time the notebook is run\n",
        "SEED = 1337\n",
        "np.random.seed(SEED)\n",
        "tf.random.set_seed(SEED)\n",
        "\n",
        "CLASSES = [];\n",
        "\n",
        "for file in os.listdir(\"/content/\"):\n",
        "    if file.endswith(\".csv\"):\n",
        "        CLASSES.append(os.path.splitext(file)[0])\n",
        "\n",
        "CLASSES.sort()\n",
        "\n",
        "SAMPLES_WINDOW_LEN = 1\n",
        "NUM_CLASSES = len(CLASSES)\n",
        "\n",
        "# create a one-hot encoded matrix that is used in the output\n",
        "ONE_HOT_ENCODED_CLASSES = np.eye(NUM_CLASSES)\n",
        "\n",
        "inputs = []\n",
        "outputs = []\n",
        "\n",
        "# read each csv file and push an input and output\n",
        "for class_index in range(NUM_CLASSES):\n",
        "  objectClass = CLASSES[class_index]\n",
        "  df = pd.read_csv(\"/content/\" + objectClass + \".csv\")\n",
        "  columns = list(df)\n",
        "  # get rid of pesky empty value lines of csv which cause NaN inputs to TensorFlow\n",
        "  df = df.dropna()\n",
        "  df = df.reset_index(drop=True)\n",
        "   \n",
        "  # calculate the number of objectClass recordings in the file\n",
        "  num_recordings = int(df.shape[0] / SAMPLES_WINDOW_LEN)\n",
        "  print(f\"\\u001b[32;4m{objectClass}\\u001b[0m class will be output \\u001b[32m{class_index}\\u001b[0m of the classifier\")\n",
        "  print(f\"{num_recordings} samples captured for training with inputs {list(df)} \\n\")\n",
        "\n",
        "  # graphing\n",
        "  plt.rcParams[\"figure.figsize\"] = (10,1)\n",
        "  pixels = np.array([df['Red'],df['Green'],df['Blue']],float)\n",
        "  pixels = np.transpose(pixels)\n",
        "  for i in range(num_recordings):\n",
        "    plt.axvline(x=i, linewidth=8, color=tuple(pixels[i]/np.max(pixels[i], axis=0)))\n",
        "  plt.show()\n",
        "  \n",
        "  #tensors\n",
        "  output = ONE_HOT_ENCODED_CLASSES[class_index]\n",
        "  for i in range(num_recordings):\n",
        "    tensor = []\n",
        "    row = []\n",
        "    for c in columns:\n",
        "      row.append(df[c][i])\n",
        "    tensor += row\n",
        "    inputs.append(tensor)\n",
        "    outputs.append(output)\n",
        "\n",
        "# convert the list to numpy array\n",
        "inputs = np.array(inputs)\n",
        "outputs = np.array(outputs)\n",
        "\n",
        "print(\"Data set parsing and preparation complete.\")\n",
        "\n",
        "# Randomize the order of the inputs, so they can be evenly distributed for training, testing, and validation\n",
        "# https://stackoverflow.com/a/37710486/2020087\n",
        "num_inputs = len(inputs)\n",
        "randomize = np.arange(num_inputs)\n",
        "np.random.shuffle(randomize)\n",
        "\n",
        "# Swap the consecutive indexes (0, 1, 2, etc) with the randomized indexes\n",
        "inputs = inputs[randomize]\n",
        "outputs = outputs[randomize]\n",
        "\n",
        "# Split the recordings (group of samples) into three sets: training, testing and validation\n",
        "TRAIN_SPLIT = int(0.6 * num_inputs)\n",
        "TEST_SPLIT = int(0.2 * num_inputs + TRAIN_SPLIT)\n",
        "\n",
        "inputs_train, inputs_test, inputs_validate = np.split(inputs, [TRAIN_SPLIT, TEST_SPLIT])\n",
        "outputs_train, outputs_test, outputs_validate = np.split(outputs, [TRAIN_SPLIT, TEST_SPLIT])\n",
        "\n",
        "print(\"Data set randomization and splitting complete.\")\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "v8qlSAX1b6Yv"
      },
      "source": [
        "## Build & Train the Model\n",
        "\n",
        "Build and train a [TensorFlow](https://www.tensorflow.org) model using the high-level [Keras](https://www.tensorflow.org/guide/keras) API."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "kGNFa-lX24Qo",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# build the model and train it\n",
        "model = tf.keras.Sequential()\n",
        "model.add(tf.keras.layers.Dense(8, activation='relu')) # relu is used for performance\n",
        "model.add(tf.keras.layers.Dense(5, activation='relu'))\n",
        "model.add(tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')) # softmax is used, because we only expect one class to occur per input\n",
        "model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])\n",
        "history = model.fit(inputs_train, outputs_train, epochs=400, batch_size=4, validation_data=(inputs_validate, outputs_validate))\n",
        "\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "guMjtfa42ahM",
        "colab_type": "text"
      },
      "source": [
        "### Run with Test Data\n",
        "Put our test data into the model and plot the predictions\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "V3Y0CCWJz2EK",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# use the model to predict the test inputs\n",
        "predictions = model.predict(inputs_test)\n",
        "\n",
        "# print the predictions and the expected ouputs\n",
        "print(\"predictions =\\n\", np.round(predictions, decimals=3))\n",
        "print(\"actual =\\n\", outputs_test)\n",
        "\n",
        "# Plot the predictions along with to the test data\n",
        "plt.clf()\n",
        "plt.title('Training data predicted vs actual values')\n",
        "plt.plot(inputs_test, outputs_test, 'b.', label='Actual')\n",
        "plt.plot(inputs_test, predictions, 'r.', label='Predicted')\n",
        "plt.show()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "j7DO6xxXVCym",
        "colab_type": "text"
      },
      "source": [
        "# Convert the Trained Model to Tensor Flow Lite\n",
        "\n",
        "The next cell converts the model to TFlite format. The size in bytes of the model is also printed out."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0Xn1-Rn9Cp_8",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Convert the model to the TensorFlow Lite format without quantization\n",
        "converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
        "tflite_model = converter.convert()\n",
        "\n",
        "# Save the model to disk\n",
        "open(\"gesture_model.tflite\", \"wb\").write(tflite_model)\n",
        "  \n",
        "import os\n",
        "basic_model_size = os.path.getsize(\"gesture_model.tflite\")\n",
        "print(\"Model is %d bytes\" % basic_model_size)\n",
        "  \n",
        "  "
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ykccQn7SXrUX",
        "colab_type": "text"
      },
      "source": [
        "## Encode the Model in an Arduino Header File \n",
        "\n",
        "The next cell creates a constant byte array that contains the TFlite model. Import it as a tab with the sketch below."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9J33uwpNtAku",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "!echo \"const unsigned char model[] = {\" > /content/model.h\n",
        "!cat gesture_model.tflite | xxd -i      >> /content/model.h\n",
        "!echo \"};\"                              >> /content/model.h\n",
        "\n",
        "import os\n",
        "model_h_size = os.path.getsize(\"model.h\")\n",
        "print(f\"Header file, model.h, is {model_h_size:,} bytes.\")\n",
        "print(\"\\nOpen the side panel (refresh if needed). Double click model.h to download the file.\")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1eSkHZaLzMId",
        "colab_type": "text"
      },
      "source": [
        "# Realtime Classification of Sensor Data on Arduino\n",
        "\n",
        "Now it's time to switch back to the tutorial instructions and run our new model on the [Arduino Nano 33 BLE Sense](https://www.arduino.cc/en/Guide/NANO33BLE)"
      ]
    }
  ]
}